Picture for Ekdeep Singh Lubana

Ekdeep Singh Lubana

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Add code
May 28, 2026
Viaarxiv icon

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

Add code
May 12, 2026
Viaarxiv icon

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

Add code
May 06, 2026
Viaarxiv icon

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability

Add code
Feb 11, 2026
Viaarxiv icon

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

Add code
Feb 02, 2026
Viaarxiv icon

From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts?

Add code
Dec 17, 2025
Viaarxiv icon

Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics

Add code
Nov 06, 2025
Viaarxiv icon

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations

Add code
Aug 07, 2025
Viaarxiv icon

Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders

Add code
Jun 24, 2025
Viaarxiv icon

Detecting High-Stakes Interactions with Activation Probes

Add code
Jun 12, 2025
Figure 1 for Detecting High-Stakes Interactions with Activation Probes
Figure 2 for Detecting High-Stakes Interactions with Activation Probes
Figure 3 for Detecting High-Stakes Interactions with Activation Probes
Figure 4 for Detecting High-Stakes Interactions with Activation Probes
Viaarxiv icon